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Abstract. We develop a new direct approach to approximating suprema 
of general empirical processes by a sequence of suprema of Gaussian pro- 
cesses, without taking the route of approximating empirical processes 
themselves in the sup-norm. We prove an abstract approximation the- 
orem that is applicable to a wide variety of problems, primarily in sta- 
tistics. Especially, the bound in the main approximation theorem is 
non-asymptotic and the theorem does not require uniform boundedness 
of the class of functions. The proof of the approximation theorem builds 
on a new coupling inequality for maxima of sums of random vectors, the 
proof of which depends on an effective use of Stein's method for normal 
approximation, and some new empirical processes techniques. We study 
applications of this approximation theorem to local empirical processes 
and series estimation in nonparametric regression where the classes of 
functions change with the sample size and are not Donsker-type. Impor- 
tantly, our new technique is able to prove the Gaussian approximation 
for the supremum type statistics under considerably weak regularity con- 
ditions, especially concerning the bandwidth and the number of series 
functions, in those examples. 



1. Introduction 

This paper is concerned with the problem of approximating suprema of 
empirical processes by a sequence of suprema of Gaussian processes. To for- 
mulate the problem, let X\, . . . , X n be i.i.d. random variables taking values 
in a measurable space (S,S) with common distribution P. Suppose that 
there is a sequence J- n of classes of measurable functions S — > R, and con- 
sider the empirical process G n f = n -1 / 2 ^™ =1 (/PQ) — E[/(Xi)]) indexed by 
T n . For a moment, we implicitly assume that each T n is "nice" enough and 
leave the measurability question. This paper tackles the problem of approx- 
imating Z n = supjgj-^ G n f by a sequence of random variables Z n equal in 
distribution to sup^gj- B n f , where each B n is a centered Gaussian process 
indexed by T n with covariance function K[B n (f)B n (g)] = K[f(Xi)g(Xi)] for 
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f,g G T n . We look for conditions under which there is a sequence of such 
random variables Z n with 

\Z n - Z n \ = Op(r n ), (1) 

where r n — > as n — > oo is a sequence of constants. 

The study of (asymptotic and non-asymptotic) behaviors of the supre- 
mum of the empirical process is one of the central issues in probability 
theory, and dates back to the classical work of [12]. The (tractable) distri- 
butional approximation of the supremum of the empirical process is of par- 
ticular importance in statistics. A leading example is uniform inference in 
nonparametric estimation, such as construction of uniform confidence bands 
and specification testing in nonparametric density and regression estimation 
where critical values are given by quantiles of supremum type statistics [see, 
e.g., 0, HE], [35], HO, 19]. Another interesting example appears in economet- 



rics where there is an interest in estimating a parameter that is given as 
the extremum of an unknown function such as a conditional mean function. 
@] proposed a precision corrected estimate for such a parameter. In con- 
struction of their estimate, approximation of quantiles of a supremum type 
statistic is needed, to which the Gaussian approximation of the supremum 
type statistics plays a crucial role. 

A related but different problem is that of approximating empirical pro- 
cesses themselves by a sequence of Gaussian processes in the sup-norm. This 
problem is stronger than ([I]). Indeed, (pQ) is implied if there is a sequence of 
versions of B n (which we denote by the same symbol B n ) such that 

||G„ - B n \\ Fn := sup |(G n - B n )f\ = O v (r n ). (2) 



There is a large literature on the problem ([2]). Notably, Komlos et al. yA 
(henceforth, abbreviated as KMT) proved that ||G n — B n \\ j = O a s .(n -1 / 2 logra) 
for S = [0, l]^ = uniform distribution on [0, 1], and T = |l[f w] : t G [0, 1]}. 
See [H] and 0] for refinements of KMT's result. 0, (13] and @] developed 
extensions of the KMT construction to more general classes of functions. 

The KMT construction is a powerful tool in addressing the problem 
([2]), but when applied to general empirical processes, it requires side con- 



ditions on classes of functions and distributions. For examples, Rio 35] 
required that T n are classes of functions of uniformly bounded variations 
on S = [0, l] d , and P has a continuous and positive Lebesgue density on 
[0, l] d . Such side conditions are essential to the KMT construction since 
it depends crucially on the Haar approximation of indicator functions and 
binomial coupling inequalities of Tusnady. [l| and [lf| considered the 
problem of Gaussian approximation of general empirical processes with dif- 



ferent approaches and thereby without such side conditions. [l3j used a 
finite approximation of a (possibly uncountably) infinite class of functions 
and apply a coupling inequality of [42( to the discretized empirical process 
(more precisely, Dudley and Philipp used a version of Yurinskii's inequality 
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proved by [HI])- an d 36], on the other hand, used a coupling inequality of 
[431 ] instead of Yurinskii's and some recent empirical process techniques such 
as Talagrand's [39I ] concentration inequality, which leads to refinements of 
Dudley and Philipp's results in some cases. However, the "global" rates that 
U ] , [H and (H] established do not lead to tight conditions for the Gaussian 



approximation to, say, the supremum deviation of kernel type statistics. 

We develop here a new direct approach to the problem ([I]) , without taking 
the route of approximating empirical processes themselves in the sup-norm 
and with different technical tools than those used in the aforementioned 
papers. We prove an abstract approximation theorem (Theorem 12. ip that 
leads to results of type (P) in several situations. The proof of the approx- 
imation theorem builds on a number of technical tools that are of interest 
in their own rights: notably, 1) a new coupling inequality for maxima of 
sums of random vectors (Theorem 14.1 1 ). in which Stein's method for nor- 
mal approximation, originally due to [37l. l38l ]. plays an important role; 2) 
a deviation inequality for suprema of empirical processes that only requires 
finite moments of envelope functions (Theorem 15 . X [) . due essentially to the 
recent work of [3] , complemented with a new "local" maximal inequality for 
the expectation of suprema of empirical processes that extends the work of 
[4~H (Theorem l5.2p . We study applications of this approximation theorem to 
local empirical processes and series estimation in nonparametric regression. 
We find that our new technique is able to prove the Gaussian approxima- 
tion for the supremum type statistics under considerably weak regularity 
conditions, especially concerning the bandwidth and the number of series 
functions, in those examples. 

It is instructive to briefly summarize here the key features of the main 
approximation theorem. First, the theorem establishes a non-asymptotic 
bound between Z n and its Gaussian analogue Z n . The theorem requires 
that each T n is pre-Gaussian (i.e., there is a version of B n that is a tight 
Gaussian random element in ^(J-^); see below for the notation), but allows 
for the case where the "complexity" of J- n increases with n and even the G n 
process does not have a limit process (in a suitable sense) in the asymptotic 
situation. Second, the theorem only requires finite moments of the envelope 
function, which should be contrasted with [23|, [H, El, Hfl] where the class of 
functions must be uniformly bounded. Hence the theorem is applicable to 
an even wide class of problems to which the previous results in those works 
are not applicable. Third, the bound in Theorem 12.11 is able to exploit the 
"local" property of the class of functions, thereby, when applied to, say, the 
supremum deviation of kernel type statistics, it leads to tight conditions 
on the bandwidth for the Gaussian approximation (see the discussion after 
Theorem 12.11 for the detail about these features). 

In this paper, we substantially rely on modern empirical process theory. 
For general references on empirical process theory, we refer to [27]], [13] and 
0- Especially, [12], Section 9.5, gives excellent historical remarks on the 
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Gaussian approximation of empirical processes. For a textbook treatment 
of Yurinskii's and KMT's couplings, we refer to [13], Chapter 10. 

1.1. Organization. The rest of the paper is organized as follows. In Sec- 
tion [2j we present the main approximation theorem (Theorem 12, ip . We give 
a proof of Theorem 12.11 in Section [6l In Section [3j we study applications of 
Theorem 12.11 to local empirical processes and series estimation in nonpara- 
metric regression. Sections [Hand [5] are devoted to developing some technical 
tools needed to prove Theorem 12.11 In Section [H we prove a new coupling 
inequality for maxima of sums of random vectors, and in Section [5l we prove 
some inequalities for empirical processes. We put some proofs in Appendix. 

1.2. Notation. We shall obey the following notation. Let (Q,A,¥) de- 
note an underlying probability space. We assume that the probability space 
(tt,A, P) is "rich enough" in the sense that there is a uniform random vari- 
able on (0, 1) defined on (Q,A, P) independent of the sample at hand. For a 
real-valued random var iable f, let ||f ||, = (E[|f I']) 1 /?, 1 < q < oo. For two 
random variables £ and rj, we write 

t d 

if they have the same distribution. 

For any probability measure Q on a measurable space (S,S), we use the 
notation Qf := J fdQ. Let C p (Q),p > 1 denote the space of all measurable 
functions / : S -> R such that ||/||q, p := (Q\f\ p ) 1/p < oo. We also use the 
notation ||/||oo '■= sup xgS > |/(x)|. Denote by eQ the £ 2 (Q)-semimetric: 

e Q (f,g) = \\f-g\\ Q ,2, f,geC 2 (Q)- 

For an arbitrary set T, let 1°°{T) denote the space of all bounded functions 
T — >• R, equipped with the uniform norm ||/||t : = sup 4gT We endow 

£°°(T) with the Borel a- field induced from the norm topology. A random 
element in £°°(T) refers to a Borel measurable map from £1 to £°°(T). For 
e > 0, an e-net of a semimetric space (T, d) is a subset T £ of T such that 
for every t G T there exists a t e G T e with d(t,t e ) < e. The e-covering 
number N(T,d,e) of T is the infimum of the cardinality of e-nets of T, i.e., 
N(T,d,e) := inf{Card(T e ) : T £ is an e-net of T}. 

The standard Euclidean norm is denoted by | • | . The transpose of a vector 
x is denoted by x T . For a smooth function / : W — > R, we use the notation 
djf(x) = df(x)/dxj, djdkf(x) = d 2 f(x)/dxjdxk, and so on. 

For a subset A of a semimetric space (U, p), let A s denotes the 5-enlargement 
of A, i.e., A s = {x G S : p(x, A) < 5} where p(x,A) = mf xe a. p(x,y). 

We write a < 6 if there is a universal constant C > such that a < C6. 
For a given parameter q, a < q b if there is a constant C(g) > depending 
only on q such that a < C{q)b. For a, 6 G R, a V 6 = max{a, 6}, a + = a V 0. 
Unless otherwise stated, c > and C > denote universal constants of 
which the values may change from line to line. 
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Lastly, for arbitrary sequence {z{\^ =l , we write E n [zj] = n 1 Ya=i z i- ^ or 
example, E n [/(Xj)] = rT x Ya=i f( x i)- 

2. Abstract approximation theorem 

We begin with reviewing the setup. Let X\, . . . , X n be i.i.d. random vari- 
ables taking values in a measurable space (S,S) with common distribution 
P. In all what follows, we assume n > 3. Suppose that there is a class 
T of measurable functions S — > M, to which a a measurable envelope F is 
attached, i.e., F is a non- negative measurable function S — > M. such that 

F(x) > sup|/(x)|, Vx G 5. 

Consider the (uniform) entropy integral 

J(5) = J(S,T,F) = [ S snpJl + logN(T,e Q ,£\\F\\ Q . 2 )d£, 
Jo Q 

where the supremum is taken over all distributions (note: supg can be 
replaced by the supremum over all finite discrete distributions). In this 
section the sample size n is fixed, and hence the possible dependence of T 
and F (and other quantities) on n is dropped. 
We make the following assumptions. 

(Al) The class J- is pointwise measurable, i.e., it contains a countable 
subset Q such that for every / 6 J there exists a sequence g m G Q 
with g m {x) — > f(x) for every x G 5. 

(A2) For some q > 2, F G C q (P). 

(A3) J(l,F,F) < oo. 

AssumptionjAl) is made to avoid measurability complications. See Sec- 
tion 2.3.1 of [40| for further discussion. This assumption ensures that, e.g., 
supj g jr G n f = sup^gg &nf, and hence the former supremum is a measurable 
map from Q to M. Assumptions (A2) and (A3) in particular ensure that J- 
is P-pre-Gaussian: 

Lemma 2.1. Under assumptions (A2) and (A3), the class J- is P-pre- 
Gaussian, i.e., there is a tight Gaussian random element Gp in with 
mean zero and covariance function ¥,[Gp(f)Gp(g)] = P(f — Pf)(g — Pg) 
for f,g G T. 

Proof. This is a standard fact. For the sake of completeness, we shall verify 
this lemma in Appendix. □ 

Here is the main theorem. For the notational convenience, let us write 
H n {e) = \og{N{F, e P , e||F|| P)2 ) V n). 
Also write M = maxi<j<„ F(X{) and T ■ T = {fg : f G T ,g G J 7 }. 
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Theorem 2.1. Suppose that assumptions (Al), (A2) with q>2>, and (A3) 
are satisfied. Let Z = supj g jrG n /. Let k > be any positive constant such 
that k 3 > E[\\E n [\f(Xi)\ 3 ]\\ T ]. Then for every e £ (0, 1] and 7 G (0, 1), there 

is a random variable Z = supy e jr Gpf such that 



Z-Z\>K(q)A n (e, 7 )} 

1 + ]p[{F/k)H(F/k > c 7 - 1/ V/ 3 # n (£)- 1 / 3 )] 



C log n 
+ — , 



n 



4 

where K(q) > is a constant that depends only on q and 
A n (e, 7 ) = J(e)\\F\\ P , 2 +n- 1 / 2 e- 2 J 2 (e)\\M\\ 2 

+ 7~ 1/ «e||ilp,2 + n- 1/2 "/- 1/q \\M\\ q + n' 1 ! 2 ^ 2/q \\M\\ 2 

+ n-VS- l {\n\\&n\\T.T]) l/2 HV\e) + n' 1 ^- 1 / 3 kH 2 / 3 (e) . 

Proof. See Section El □ 

Remark 2.1. 1. Theorem 12.11 is a non-asymptotic result. In applications 
T and F (and even S) may change with re, i.e., T = T n and F = F n . In 
that case, assumption (A3) is interpreted as J(l, T n , F n ) < 00 for each n, 
but it allows for the case where J(l, J-^, F n ) diverges as re — )• 00. 

2. The factor 1/4 on the right side has no special meaning. It can be 
replaced by a smaller positive constant, but at the cost of increasing the 
constant K{q)- We do not pursue the generality in this direction. 

3. By [l2|, Theorem 3.1.1, one can extend Gp to the linear hull of T in 
such a way that Gp has linear sample paths. Hence 

||G n ||jr= SUp G n f, \\Gp\\jr= SUp Gpf, 

/ePu(-P) /ePu(-P) 

where — J- :={—/:/ G J-}, from which one can readily deduce the following 
corollary. Henceforth we only deal with supj g jrG n /. 

Corollary 2.1. The conclusion of Theorem \2.1\ continues to hold with Z 

replaced by Z = ||G n ||.p, Z replaced by Z = ||Gp||.p, and with different 
constants. 

When applying Theorem 12.1} one has to derive suitable bounds on 

^n||p-p]- 

We can simply bound these terms by 

|3l||_l II E 1 II 3 TOTII/T" II 1 <f 7/1 77 Z7MI Z7l|2 



EOlEnfl/prOI 3 ]^] andE[ 



E[||E n [|/pQ)HH.F] < \\F\\p, 3 , n\\G n \\^]<J(l,F,F)\\F\\ PA . 



The latter estimate is deduced from Theorem 2.14.1 of [40( together with 
the fact that 

supiV(J- • F, e Q , 2e||F 2 ||Q i2 ) < supN 2 (F, eg,e||F||g j2 ). (3) 
Q Q 
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See Lemma I A. II in Appendix. These simple bounds are, however, too 
crude when PF 3 and PF A are significantly larger than the "weak" mo- 
ments supy 6 jr P|/| 3 and sup^gj-P/ 4 , respectively, which is the case for all 
the examples studied in Section [31 The following lemma will be useful for 
handling such cases. 

Lemma 2.2. Suppose that assumptions (Al), (A2) with q = 4, and (A3) 
are satisfied. For k = 3,4, let 8^ G (0, 1] be any positive constant such that 
h > sup /e j- ||/||p,jfc/||-F||p,jfe. Then 

E[||E„[|/(X,)| 3 ]|b]-su P P|/| 3 



rvj II 1 1 o 



and 



n\\Gn\M < Ast,T,F)\\Ff PA + \\ M \\ijyy,F) _ 

Proof. See Appendix. □ 

Before going to the applications, we discuss the key features of Theorem 
12.11 First, Theorem 12.11 does not require uniform boundedness of F, and 
requires only finite moments of the envelope function. At first, this sounds 
not surprising. However, most papers working on the Gaussia n ap proxima- 
tion of empirical processes in the sup-norm, such as 23|, H, El, Hif , required 



that classes of functions are uniformly bounded. There are, however, many 
statistical applications where uniform boundedness of the class of functions 
is severe, and the generality of Theorem 12. II in this direction will turn out to 
be useful. The cost of this generality is that 7, which in applications we take 
as 7 = 7„ — > 0, is typically at most 0(n -1 / 6 ), and hence Theorem 12.11 gives 
only "in probability bounds" rather than "almost sure bounds" (inspection 
of the proof shows that the n _1 logn term can be replaced by n~ 2 , at the 
cost of increasing the constant K(q)). However, this should be considered 
as a consequence of the generality of allowing for non-uniformly bounded 
classes of functions, and should not be considered as a genuine drawback of 
the theorem. The second feature of Theorem 12. II is that it is able to exploit 
the "local" property of the class of functions F . By Lemma I2.2| typically, 
we may take k 3 w sup^gjr P\f\ 3 , and E[||G n || jr.jr] supy g j- y P/ 4 (up to a 
log term). In some applications, e.g., the supremum deviation of kernel type 
statistics, the class F = F n changes with n and sup^g^ P\f\ k with k = 3, 4 
decrease to (while the envelope functions F n are such that PF% = O(l) for 
k = 3,4). The bound in Theorem 12.11 can effectively exploit this informa- 
tion and lead to tight conditions on, say, the bandwidth, for the Gaussian 
approximation. This feature will turn out to be clear from the proofs for 
the applications in the following section. 



8 



CHERNOZHUKOV, CHETVERIKOV, AND KATO 



3. Applications 

This section studies applications of Theorem 12.11 to local empirical pro- 
cesses and series estimation in nonparametric regression. All the proofs in 
this section are gathered in Appendix. In both examples, the classes of func- 
tions change with the sample size ti and the corresponding G n processes do 
not have tight limits. Hence regularity conditions for the Gaussian approx- 
imation for the suprema will be of interest. 

3.1. Local empirical processes. This section applies Theorem [XT] to the 
supremum deviation of kernel type statistics. Let (Yi,Xi), . . . , (Y n , X n ) be 
i.i.d. random variables taking values in the product space U x M d , where 
(U,U) is an arbitrary measurable space. Suppose that there is a class Q 
of measurable functions U — > R. Let k(-) be a kernel function on M. d . By 
"kernel function", we simply mean that k{-) is integrable with respect to the 
Lebesgue measure on M. d . We do not ask k(-) to be non-negative nor the 
integral of k(-) over W 1 to be 1. Let h n be a sequence of positive constants 
such that h n — > as n — > oo, and let / be an arbitrary Borel subset of M. d . 
Consider the kernel-type statistics 

n 

S n {x,g) = — [ Y.s( Y i)KK\X i -x)), (x,g)elxg. 

n i=l 

When the kernel function k(-) is such that J Rd k(t)dt = 1, under suitable 
regularity conditions, S n (x,g) will be a consistent estimator of E[g , (Yi) | 
X\ = x]p(x), where p(-) denotes a Lebesgue density of the distribution of 
X\. For example, when g = 1, S n (x,g) will be a consistent estimator of 
p(x); when U = M and g(y) = y, S n (x,g) will be a consistent estimator of 
E[Yi | Xi = x]p(x); and when U = R and g(-) = 1(- < y),y G M, S n (x,g) 
will be a consistent estimator of P(Yi < y \ X\ = x)p(x). In statistical 
applications, it is often of interest to approximate the distribution of the 
following quantity: 

W n = sup c g (x)Jnh d l (S n (x,g) - E[S n (x,g)]), 

where c g (x) is a suitable normalizing constant. A typical choice of c g (x) 
would be such that 

Var(^nh d S n (x, g)) = c g {x)~ 2 + o(l). 

Limit theorems on W n are developed in 0], [IH], 0, and [II], 

among others. 

[13] called the process g h-» \J nh d l (S n (x, g) — E[S' n (x, g)]) a "local" em- 
pirical process at x (the original definition of the local empirical process 
in []1| is slightly more general in that h n is replaced by a sequence of bi- 
measurable functions). With a slight abuse of terminology, we also call the 
process (x,g) \- > -^/^/^(^(x, g) — E[5 n (x, g)]) a local empirical process. 
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We consider the problem of approximating W n by a sequence of suprema 
of certain Gaussian processes. For each n > 1, let B n be a centered Gaussian 
process indexed by I x Q with covariance function 

E[B n (x,g)B n (x,g)} 

= h- d c g (x)cg(x)Cov[g(Y l )k(h~ 1 (X l - x)), g(Y 1 )k(h- 1 (X l -£))]. (4) 

Intuitively, it is expected that under suitable regularity conditions, there is a 
sequence W n of random variables such that W n = sup( z g ) g/x g B n (x,g) and 

' — P 

as n — > oo, \ W n — W n \ —> 0. We shall argue the validity of this approximation 
with explicit rates. 

Before stating the assumptions, we recall the notion of VC type class. 

Definition 3.1. Let T be a class of measurable functions on a measurable 
space (5,5), to which a measurable envelope F is attached. We say that T 
is VC type with envelope F if there are constants a > and v > such that 
sup Q iV(^,eQ,e||F||Q i2 ) < (a/e) v for all < e < 1. 

We make the following assumptions. 

(Bl) Q is a pointwise measurable class of functions U — > M. uniformly 
bounded by a constant b > 0, and is VC type with envelope = b. 

(B2) k(-) is a bounded and continuous kernel function on IR^, and such 
that the class of functions K, = {t \- > k{ht + x) : h > 0, x G M d } is 
VC type with envelope = ||fe||oo- 

(B3) The distribution of X\ has a bounded Lebesgue density p(-) on M. d . 

(B4) /i n — » and log(l//i n ) = O(logn) as n — > oo. 

(B5) C Ix g := sup (2 , i9 ) e/x g |c s (x)| < oo. Moreover, for every (x m ,g m ) £ 
IxQ with x m — > x G / and g m ^ 9 ^ Q pointwise, c 9m (x m ) — > c g (x). 
We first assume that ^ is uniformly bounded, which will be relaxed later. 
[33I ]. Lemma 22, gives simple sufficient conditions under which IC is VC type. 

Proposition 3.1. Suppose that assumptions (B1)-(B5) are satisfied. Then 
for every n > 1, there is a tight Gaussian random element B n in £°°(I x Q) 
with mean zero and covariance function and there is a sequence W n of 
random variables such that W n = sup^ g^^g B n (x,g) and as n — > oo, 

\W n - W n \ = ¥ {(nh d n )- l l & logn + (n^)" 1 / 4 l og 5 / 4 n + {nh d n )- 1 ' 2 log 3 / 2 n}. 

Even when Q is not uniformly bounded, a version of Proposition 13.11 con- 
tinues to hold provided that suitable restrictions on the moments of the 
envelope of Q are assumed. Instead of assumption (Bl), we make the fol- 
lowing assumption. 

(Bl)' Q is a pointwise measurable class of functions U — > R with mea- 
surable envelope G such that E[G 9 (Y"i)] < 00 for some q > 4 and 
sup^g^d E[G 4 (Y"i) I X\ = x] < 00. Moreover, Q is VC type with 
envelope G. 
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Then we have the following proposition. 

Proposition 3.2. Suppose that assumptions (Bl)' and (B2)-(B5) are sat- 
isfied. Then the conclusion of Proposition \3.1\ continues to hold, except for 
that the speed of approximation is 

F {(nh d n y 1/6 log n + {nhi)- 1 ^ log 5/4 n + (n 1 " 2 /^)" 1 / 2 log 3/2 n}. 

Remark 3.1. It is instructive to compare Propositions 13, ll and 13.21 with the 
implications of Theorem 1 . 1 of Rio [35[ . 

1. Apparently, Rio's Theorem 1.1 is not applicable to the case where the 
envelope function G is not bounded. Hence Proposition 13.21 is not implied 
by that theorem. Indeed, we do not are of any previous result that leads to 
the conclusion of Theorem 13.21 at least in this generality. 

2. In the special case of kernel density estimation (i.e., g = 1), Rio's 
Theorem 1.1 implies (subject to some regularity conditions) that \W n — 
W n \ = O a . s .{{nh d n )- l ^ 2d ^ ^/YSgn + (nh d n )- 1 1 2 log n] for d > 2 (the d = 1 
case is formally excluded from [35l]). Hence Rio's rates are better than ours 
when d = 2, 3, but worse when d > 4 (aside from the difference between "in 
probability" and almost sure bounds). 

3. On the other hand, consider, as a second example, kernel regres- 
sion estimation (i.e., U = K and g(y) = y). In order to formally ap- 
ply Rio's Theorem 1.1 to this example, we need to assume that the joint 
distribution of (Yx,Xi) is supported on [0, l] d+1 (this condition can be 
weakened in such a way that the support of (Yi,Xi) is a bounded rect- 
angular in M rf+1 ), and admits a continuous and positive Lebesgue density 
on [0, l] d+1 . Subject to such side conditions, Rio's Theorem 1.1 leads to 
\W n - W n \ = O a . s .{{n d ^ d+l ^h d )- l ^ 2d ^^hgE+ (ra/i^-^logra}. See, e.g., 
[8J, Theorem 8. Hence, aside from the difference between "in probability" 
and almost sure bounds, as long as h n = 0{n~ a ) for some a > 0, our rates 
are always better when d > 2. When d = 1, up to a log term, our rate is 
better as long as nh^ — > (and vice versa). 

Remark 3.2. Importantly, notice that both Propositions 13.11 and 13.21 re- 
quire only mild restrictions on the bandwidth h n . Up to logs, Proposition 
13.11 requires nh d — > oo, and Proposition 13.21 requires n 1 ~ 2 / q h d — > oo. In- 
terestingly, they essentially coincide with the conditions on the bandwidth 
used in establishing exact rates of uniform strong consistency of kernel type 
estimators in [lil . 17]. 



3.2. Series estimation. This section considers an application of Theorem 
12.11 to series estimation in nonparametric regression. Consider a canonical 
nonparametric regression model 

Yi = m(Xi) + Vi , E[rn \ X, t ] = 0, E[t? 2 | X t ] = a 2 > 0, 1 < i < n, 

where Yi is a scalar response variable, Xi is a d-vector of covariates of which 
the support = [0, l] d , and rji is a scalar unobservable error term. We assume 
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that the data (Yi, Xi), . . . , (Y n , X n ) are i.i.d. The parameter of interest is 
the conditional mean function m(x) = E[Yi | X\ = x]. 

We consider series estimation of m(x). Suppose that for each K > 1, 
there are K basis functions ipK,l: ■ ■ ■ i^K,k defined on [0, l] d . Let i(} K (x) = 
(^k,i{x), ■ ■ ■ , 4 } k,k{x)) t . Examples of such basis functions are Fourier se- 
ries, splines, Cohen-Daubechies-Vial (CDV) wavelet bases (§], Hermite poly- 
nomials and so on. Let K n be a sequence of positive constant such that 
K n — > oo as n — > oo. The idea of series estimation is to approximate m(x) 
by Ylj=i 6K n jipKj(x) and to estimate the vector 6 Kn = (9k„,1, ■ ■ • , Kn> K n ) T 
by the least squares method: 

n 

Kn = arg min V fa - i) Kn (Xi) T 9 Kn ) 2 . 
i=i 

The resulting estimate of m{x) is given by fh(x) = ip K " (x) T 9 Kn . 

The asymptotic properties of the series estimate have been thoroughly 
investigated in the literature. Importantly, under suitable regularity condi- 
tions, fh(x) admits an asymptotic linear form 



m(x) - m(x) w ^ Kn (x) T (^[i) Kn (X 1 )i} Kn (Xx)]Y 1 



1 n 



See, e.g., [32[ |. [To make this approximation precise, we have to use under- 
smoothing to make the effect of the bias negligible relative to the right side. 
However, we skip the discussion on control of the bias since it is out of the 
scope of this paper.] Redefining ifj Kn (x) by 

if; Kn {x) <- {^ Kn {X l )^ Kn {X 1 )])- x l 2 ilj Kn {x), 

we have the following formal approximation: 

yjn{fh(x) — m{x)) 1 ip Kn (x) T 



a\tp Kn (x)\ a \ip Kn (x)\ 



■-: -S n (x) 
a 



Therefore, for the purpose of making uniform inference on m(x) over a 
Borel subset I of [0, l] d , it is desirable to have a (tractable) distributional 
approximation of the following quantity: 

W n = sup5 n (x). 

We address this problem in what follows. Let B n be a centered Gaussian 
process indexed by I with covariance function 

¥.[Bn{x)B n {x')} = a 2 a Kn (x) T a Kn (x'), (5) 

where ax(x) = ip K (x)/\i/j k (x)\. Intuitively, it is expected that under suit- 
able regularity conditions, there is a sequence W n of random variables such 

d ' — ' IP 

that W n = sup xg/ B n (x) and as n — > oo, \W n — W n \ — > 0. We shall argue 
the validity of this approximation with explicit rates. 
We make the following assumptions. 
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(CI) For each K > 1, E[i/) K (Xi)ip K (Xi)] = I K , b K •= su P:ce[0il]d |^(x)|V 
1 < oo, and the map x ip K (x)/\ij) K (x)\ =: aj^{x) is Lipschitz con- 
tinuous with Lipschitz constant < Lk(> 1), i.e., 

\olk{x) — o.k{x')\ < Lk\x — x'\, Vx, x E [0, l] d . 

(C2) logbK n = O(logn) and logL^- n = O(logn) as n — )■ oo. 

For many commonly used basis functions such as Fourier series, splines 
and CDV wavelet bases, bx = 0{yf~K) as K — > oo. See [33]. The Lipschitz 
continuity of or-(x) is implied if mf x ^ 01 ^d \i/j k (x)\ > and tp K (x) is Lips- 
chitz continuous. Condition (C2) states mild growth restrictions on K n and 
Lk„ and is usually satisfied. 

Proposition 3.3. Suppose that assumptions (CI) and (C2) are satisfied. 
Moreover, suppose either (i) rj\ is bounded, or (ii) E[|?]i| 9 ] < 00 for some 
q > 4 and sup xg [ 0j i]d E[rjf \ X\ = x] < 00. Then for every n > 1, there is a 
tight Gaussian random element B n in ^°°(J) with mean zero and covariance 
function {5p, and there is a sequence W n of random variables such that 
W n = sup^gj B n [x) and as n — > 00, 

\W n -W n \ 

fO P {n- 1 /66V3l ogn + n -l/45V2 lo g5/4 n + rl -l/2 5 ^ log 3/2 n | ) ^ 

" \0 ¥ {n-y% X ll log n + n- x l% X ll log 5/4 n + n^+^b^ log 3/2 n}, (ii). 

Proposition 13.31 is a new result. Suppose that bx = 0{y/~K) as K — > 00. 
Up to logs, Proposition 13.31 requires K n /n — > in case (i) and K n /n l ~ 2 l q 
in case (ii). These requirements are mild, in view of the fact that at 
least K n /n — > is needed for consistency (in the L 2 -norm) of the series 
estimator [see, e.g., [2l[. Another approach to deduce a result similar to 
Proposition 13.31 is to apply Yurinskii's coupling (see Theorem 14.21 ahead) to 
random vectors r)iip Kn (Xi) , i = 1, . . . ,n, which, however, requires a rather 
stringent restriction on K n , namely K\jn — > (up to a log term), for 

P I — L 

ensuring \ W n — W n \ — > 0. See, e.g., |g|, Theorem 7. 

4. A COUPLING INEQUALITY FOR MAXIMA OF SUMS OF RANDOM VECTORS 

The main ingredient in the proof of Theorem 12.11 is a new coupling in- 
equality for maxima of sums of random vectors, which is stated as follows. 

Theorem 4.1. Let X\, . . . ,X n be independent random vectors in W with 
mean zero and finite absolute third moments, i.e., K[Xij] = and E[| Xjj| 3 ] < 
00 for all 1 < i < n and 1 < j < p. Consider the statistic 

n 

Z = max > Xaa. 
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Let Y\, . . . , Y n be independent random vectors in MP such that 
Yi ~ N(0,E[X t Xf}), 1 < i < n. 

Then for every f3 > and 5 > 1//3, there is a random variable Z 
max i<j<p Y17=l Yij such that 

~w - s\ > i„ gP + 3*) < i±w?^Ei±m±m, 

where e = epj is given by 



e = ^e~ a {l + a) < 1, a = f3 2 5 2 - 1 > 0, 



and 



Si 



Do 



E 



E 



max | y (XijX ik - E[XijX. 
i<i,fc<p ' 

i=l 



ik\ 



max > \Xi 



VE max \X i:j \ 3 ■ 1 ( max > /T 1 / 2 ! 
r-f i<i<p \i<j<p / 



The following corollary is useful for many applications. Recall n > 3. 



Corollary 4.1. Consider the same setup as in Theorem \4-l\ Then for every 
5 > 0, there is a random variable Z = maxi<j< p Y17=i ^ij suc h that 

F(\Z-Z\ > 165) <r 2 {S 1 + r 1 ( J B 2 + J B 4 )log(pVn)}log(pVn) + ^, (6) 
where B\ and B2 are as in Theorem\4-l\ and 



n 



Ba 



max iXjol 3 
i<i<P J 



1 max \Xij\ > <5/log(p Vn) 
l<j<p 



Proof of Corollary \4- 1\ In Theorem 14. 1\ take j3 = 25 1 log(pVn). Then 
a = j3 2 5 2 - 1 = 41og^(p V n) - 1 > 21og(p V n) (recall n > 3 > e), so that 
e < 21og(p V n)/(p V n) < 2n _1 logn. This completes the proof. □ 

Remark 4.1. Inspection of the proof shows that the n _1 logn term on the 
right side can be replaced by n~ a for any a > 0, with changing the constant 
"16" on the left side. 

Theorem 14. II is a coupling inequality similar in nature to Yurinskii's [42]. 
Before proving Theorem l4.ll let us first recall Yurinskii's coupling inequality. 

Theorem 4.2 ([12]; see also [26]). Consider the same setup as in Theorem 
4-l\ Let S n = X^r=i^«- Then for every 5 > 0, there is a random vector 



log(i/A))l 

V 



T n = Y^i=i Yi such that 



F(\S n -T n \ >36)<B 1 + 
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where 



^o=pr 3 ^E[|x,| 3 ] 

i=l 



For the proof, see [34|], Section 10.4. Because of the general fact that 
maxi<j< n \xj\ < \x\ for x £ M p , one has 

| max (S n )j - max (T n )A < max |(5 n - T n )A < \S n - T n \. 
i<j<p i<i< n i<i<p 

Hence if we take Z = maxi<j< p (r n )j, 

P(|Z-Z|>3^)<i?o(l+ |10g( ^ o)l ). (7) 

Unfortunately, when p is large, the right side is typically too crude. This is 
because Bq is proportional to ^[l^*l 3 ] an d ^ s quantity may be larger 
than what we want. 

To better understand the difference between ([6]) and ([7]), consider the 
situation in which p is indexed by n and p = p n — > oo as n — > oo. Moreover, 
consider the simple case where X%j = Xij/y/n and \xij\ < b (xij are random; 
b is a fixed constant). Then 

B x = OirT 1 / 2 log 1 / 2 p n ), B 2 + B 4 = 0{n^' 2 ). 

The former estimate is deduced from the fact that, using the symmetriza- 
tion and the maximal inequality for Rademacher averages conditional on 
Xi, . . . ,X n [use I40I Lemmas 2.2.2 and 2.2.7], one has 



n 



=1 



Bi < Vlog(l+p)E 
On the other hand, 

n 

p n ^2\X,\ 3 = 0(n^V n /2 ). 
i=l 

Therefore, the former ([6]) allows p n to be of an exponential order (p n can 
be as large as \ogp n = o(n x / 4 ); hence, e.g., p n can be of order e n ° for 
< a < 1/4), while the latter © restricts p n to be p n = o(n 1 / 5 ). 

Remark 4.2. The importance of Theorem l4.1l in the context of the proof of 
Theorem 12.11 is described as follows. In the proof of Theorem 12. 1\ we make 
a finite approximation of J 7 by a minimal £||.F[|p2-net of (J 7 , ep) and apply 
Theorem l4.1l to the "discretized" empirical process; hence in this application, 
p = N(T, ep, e||jP||p2). The fact that Theorem 14.11 allows for "large" p is 
translated into that a "finer" discretization is possible, and as a result, the 
bound in Theorem 12.11 depends on the covering number iV^J 7 , ep, e||jP||p2) 
only through its log: log N{F, ep, e||F||p2). 

We will use a version of Strassen's theorem to prove Theorem 14.11 We 
state it for the reader's convenience. 
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Lemma 4.1. Let fi and u be Borel probability measures on R. Let e > 
and 5 > be two positive constants. Suppose that fi(A) < u(A s ) + e 
for every Borel subset A of R. Let V be a random variable defined on a 
probability space (O, A, P) u>i£/i distribution [i. Suppose that the probability 
space (Q,A., P) admits a uniform random variable on (0, 1) independent ofV. 
Then there is a random variable W, defined on (£l,A,¥), with distribution 
v such that P(|V - W\ > 5) < e. 

Proof. By Strassen's theorem [see [13, Section 10.3], there are random vari- 
ables V* and W* with distributions \i and v such that P(|T^* — W*\ > S) < s. 
V* may be different from V. Let F(w \ v) be a regular conditional distri- 
bution function of W* given V* = v. Denote by i ?_1 (r | v) the quantile 
function of F{w \ v), i.e., F _1 (r | v) = m.i{w : F(w \ v) > r}. Generate 
a uniform random variable on (0, 1) independent of V and take W(u) = 

F- 1 (6(u) | V(u)). Then it is routine to verify that (V, W) = (V*,W*). □ 

Proof of Theorem \4-l\ For the notational convenience, write ep = logp. 
Construct Y\,...,Y n independent of X\, . . . , X n . By Lemma 14. 1\ the con- 
clusion follows if we can prove that for every Borel subset A of R, 

p(z g A) < ¥(z* e ^+ 35 ) + g + ^ < r 1 {gi + ^ + g 3)>, 

where Z* := maxi<j<p X^it=l Yi r Let ^ = S£=l x i and r n = ^™ =1 Yi- Fix 
any Borel subset A of R. We divide the proof into several steps. 

Step 1: We approximate the non-smooth map x i— > l,A(maxi<j-<p Xj) by a 
smooth function. The first step is to approximate the map x t— >■ maxi<j< p 
by a smooth function. Consider the function Fg : M p — > R defined by 

F /3 (x)=/?- i iog (x>^ 

which gives a smooth approximation of maxi<j< p Xj. Indeed, an elementary 
calculation gives the following inequality: for every x = (x\, . . . , x p ) T E R p , 

max Xj < Fff(x) < max Xj + logp. (8) 
i<j<P J P i<j<P J 

See 0. Hence we have 

P(Z e A) < P(Fp(S n ) G A e ') = E[l A * p (Fp(S n ))]. 

Step 2: The next step is to approximate the indicator function 1 1— )■ 1^ (i) 
by a smooth function. This step is rather standard. 

Lemma 4.2. Let /3 > and 5 > 1/(3. For every Borel subset A of R, 
i/iere is a smooth function g : R — > R suc/i t/iai ||c/||oo < <^ ills^'Hoo < 
C/JrMl/'Hoo < C(3 2 5-\ and 

(1 - e)U(t) < g(t) <£ + (!- e)l AM (t), Vt G R, 
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where e = ep t s is given by 



e = ^e~ a {l + a) < 1, a = /3 2 5 2 - 1 > 0. 



Proof of Lemma \4.2\ The proof is due to [3JJ, Lemma 10.18 (p. 248). Let 
•) denote the Euclidean distance on R. Then consider the function h(t) = 
(l—p(t, A^)/5)j r . Note that h is Lipschitz continuous with Lipschitz constant 
< 8 . Construct a smooth approximation of h(t) by 

g(t) = -?= [ h{s)e-^ 2 ^ 2 ds = -^= f h(t + r 1 s)e- 1 2 s2 ds. 
v27r Jm. v2vr Jr 



Then the map 1 1— > g(t) is infinitely differentiable, and 

Halloo < S-\ ll/IU < Cfi5-\ H/'IU < C(3 2 S-\ 



The rest of the proof is the same as [34j, Lemma 10.18 and omitted. □ 

Apply Lemma 14.21 to A = A £B to construct a suitable function g. Then 

E[l A e, (Fp(S n ))} < (1 - e^Elg o F (S n )]. 

Step 3: The next step uses Stein's method to compare E[g o Fp(S n )] and 
E[g o Fa{T n )\. The following argument is inspired by |7(, Theorem 7. We 
first make some complimentary computations. 

Lemma 4.3. Let /3 > 0. For every g G C 3 (]R), 
p 

]T \djd k (g o Fp){x)\ < ll/IU + 2||<7 , ||oo/3, (9) 
i,fc=i 
p 

1^5^(5 o Fp){x)\ < H/'lloc + ell/lloo/? + ell^lU/? 2 . (10) 

i,fc,i=i 

Moreover, letting £>(/3 _1 ) = {t/ £ R p : < /3 _1 , 1 < V j < p} ; 
p 

V sup \djd k di(g oFp)(x + y)\ 

<C(\\g"'\\ O0 + \\g"\\ O0 (3 + \\g'\\ O0 (3 2 ). (11) 
Proof of Lemma \4.3\ Let = l(j = fc). A direct calculation gives 

djFp{x) = TTj(z), djd k Fp(x) = f3w jk (x), d j d k diF /3 (x) = f3 2 q jk i{x), 
where 

TTj(x) = eP x * /ELi^' '> «>i*(aO = (TTj-tffjb - 7r i 7r fc )(a;), 
gjfeK^) = (^jfijrfjk ~ Kjirrfjk - irjir k (5ji + 5 k i) + 27Tj7r fe 7rz)(x). 

By these expressions, we have 

p v P 

TTj(x) > 0, ^7Tj(x) = 1, ^ k jfc (x)| < 2, ^ kjfc/(^)| < 6. 
j=l j,k=l j,k,l=l 
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Inequalities ([9]) and (flOl) follow from these relations and the following com- 
putation. 

dj{g°Fp)(x) = (gf o Fp)(x)iTj(x), 

djd k (g o F p ){x) = (g" o Fp)(x)-Kj(x)ir k (x) + (g' o Fp)(x)(3w jk (x), 
djd k di{g o F p )(x) = (g'" o F /3 )(x)7r j (x)7r fc (x)vr / (x) 

+ (g" o Fp){x)f){wjk{x)m{x) + Wji(x)ir k (x) + wu{x)irj(x)) 

+ (g'oFp)(x)p 2 q jk i(x). 

For the last inequality (fTTjh it is standard to see that whenever \yj\ < 
P~\l<Vj<p, 

TTj(x + y) < e 2 TTj(x), 
from which the desired inequality follows. □ 

For i = 1, . . . , n, let X[ be an independent copy of Xj. Let I be a uniform 
random variable on {1, . . . , n} independent of all the other variables. Define 

S' n = S n — Xj + Xj. 

For A G WP, 

^ i=i 



n — ' n ^-^ A 

j=l i=l j'^i 

n 

i=i 

Hence ^ = S„. Also with Zf = {X 1: . . . , X n }, 
E[S' n - S n | XI] = E[X'j - Xj | X?] 
1 n 

= -Ve[X 4 ' - Xi | XI 1 ] = -n^Sn, (12) 

and 

E[(5^ - S n )(^ - S n ) T | Xfl = E[(X'j - X^X'j - Xjf | X?] 

n n 
n £ — * in — * 



n * — ' n 

i=l i=l 



n n 

1 J2nX iX ?] + i - E[^Xf ]) 

j=l 8=1 

2 n 

-^E[^Xf]+n-V, (13) 



n . 

i=\ 
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where V is the p x p matrix defined by 



1 " 

V = (V jk )i< jt k< P = ~ Y,^ X i ~ ^\ X i X i I)- 



i=l 



For the notational convenience, write f = g o Fp. Consider 



h(x) 



Then Lemma 1 of [31] implies 

p p n 

j^xjdjhix) - £ ^ElXijX^djdkhix) = f(x)-E[f(T n )}, 
j=i j,k=i i=i 

and especially 



E[f(S n )} - E[f(T n )] = E 



p n 



i=i i=i 



E 



p n 



j,jfc=l i=l 



(14) 



Denote by V/i(x) and Hess the gradient vector and the Hessian ma- 
trix of h(x), respectively. Let 

R = h(S' n ) - h(S n ) - (S' n - S n ) T Vh(S n ) 
- 2-\S' n - S n ) T (Bess h(S n ))(S' n - S n ). 

Then one has 
= nE[h(S' n ) - h(S n )} (S' n = S n ) 
= nE[(S' n - S n ) T Vh(S n ) + 2^(S' n - S n ) T (Bess h(S n ))(S' n - S n ) + R] 



nE 



E 



E[(5; - S n f I X?]Vh(S n ) 

+ 2- 1 Tr ((Hess h(S n ))E[(S' n - S n )(S' n - S n ) T \ X?} ) + R 



p n 



v n 



Y J Y. X ^ d M S n)+ ^nXijXikldjdkhiSn) 
j=l i=l j,k=l i=l 



+ E 



1 p 

9 Y v jkdjd k h(S n ) + nR 



j,k=l 

-E[f(S n )]+E[f(T n )]+E 



(by O and (USD) 



1 v 

9 Y v jkdjd k h(S n ) + nR 



j,k=i 



, (by Iff 
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- Vjkdjd k h{S n ) + nR 



j,k=i 



that is, 

E[f(S n )) - E[f(T n )} = E 

Using Lemma 14.31 one has 

v v 

| V jk djd k h(S n )\ < max \V jk \ \djd k h(S n )\ < C/35" 1 max \V jk \, 

— i<ii^<p — i<i>^<p 

i,fc=l i,A=i 



and with Aj := (Aji, . . . , A 



\T ._ w 



X - Xi 



\E[nR}\ 



E 



n p 



< -E 
~ 2 



\J2 £ A^I^-^WMSn+^A 

i=l j,k,l=l 

(9 ~ i/(0, 1) independent of all the other variables) 

n p 

^ WjAikAal-ldjdkdMSn + OA,] 



i=i i,fc,;=i 

Let = l(maxi<j<p |Ay| < and Xi : = 1 - Xi- Then 



(15) 



&i* 



i=l 



+ -E 
2 



,i=l 



P) + (B)] . 



Observe that 



(A) < E 



} max (xi ■ \djd k dih(S n + 0A*)|) x max | Ai,-A ifc A 

* — * Ki<n t^A h I-?™* — * 



j,k,i=i 



i<j,k,l<P^ 



i=l 



< Cp 2 5- 1 E 



< C/3 2 <5 -1 E 



max |Aj, Aj fc Aj 
i=i 



(by CLE])) 



max > |Aj 

4=1 



< C/3 2 ^ X E 



max 

i<i<p z 1 
?=i 



Cf3 2 5- 1 B 2 , 



and 



i=l 
n 



i=l 



Xi max | A^ | 3 
i<?<p 



Xi max iXyl 
i<?<p 



(by (HDD) 
(by symmetry) 



Because 



< 1 (m^l^l > rV2) + 1 (max |^| > /T 1 ^) 
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y, c max iXaf < E max |XjJ 3 • 1 f max IX, I > 8 1 12 ) 
M i<j<P l Jl J - [l<j<P l Jl V<j<P l 31 J. 



We recall the folio-win; 



+ E max \Xii\ 3 
[l<j<P J _ 

lg lemma. 



max \Xu\ > P~ 1 /2) . 
\l<j<p J J 



(16) 



Lemma 4.4. Let ip and tp be two functions defined on an interval I in R. 
Let £ be a random variable such that P(£ 6 X) = 1. Suppose that E[|c/?(£)|] < 

oo,E[|V»(OI] < 00 and E[|<p(0^(0ll < 00 ■ T/ien CovMO^CO) - # ^ 
and "0 are monotone in the same direction, and Cov((/?(£), V'(C)) — V 7 
and "0 are monotone in the opposite direction. 

Proof of Lemma\4-4\ Denote by /i the distribution of £. Then 



Cov(<£(£),^(£)) = / / ^M" / ^dn 



1 



bit) - v( s ))m) - ^ s ))d^ s )dfj,(t). 



2 jx Ji 

The conclusion of the lemma is deduced directly from this expression. □ 

Since the maps t \— > i 3 and 1 1— > l(t > /3~ 1 /2) are non-decreasing on [0, oo), 
the second term on the right side of (|16p is not larger than the first term. 
Hence 



(B) < C/3 2 r x ^E 
Therefore, 



i=l 



max 

i<i<P 



IX^I 3 • 1 f max |.Xy| > 



Cf3 2 5- 1 B 3 . 



\E[f(S n )} - E[f(T n )}\ < C86~ 1 (B 1 + B(B 2 + B 3 )). 
Step 4: Combining Steps 1-3, one has 

CB5- 1 {B 1 + B(B 2 + B 3 )} 



E(ZeA) < (l-e)- 1 E[goF (T n )] + 



1 - e 



(by construction of 5) 

< P( z- 6 a*»*») + £±CgQgt^(ft±M, (LemmaE » 



This completes the proof. 



□ 
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5. Inequalities for empirical processes 

In this section, we shall prove some inequalities for empirical processes 
that will be used in the proof of Theorem 12.11 These inequalities are of 
interest in their own rights. Consider the same setup as in Section [21 i.e., let 
X\, . . . ,X n be i.i.d. random variables taking values in a measurable space 
(S, S) with common distribution P. Let J 7 be a pointwise measurable class 
of functions S — > M, to which a measurable envelope F is attached. Consider 
the empirical process G n f = n~ x l 2 X^LiCfO^Q) — Pf)- Let a 2 > be any 
positive constant such that 



Let 



supPf 2 <a 2 < \\F\\ 2 Pj2 . 



M = max F(Xi). 

Ki<n 



Theorem 5.1. Suppose that F S C q {P) for some q > 2. Then for every 
t > 1, with probability > 1 — t~ q l 2 , 



\\Gn\\r < (1 + a)E[||G n |W + K(q) [(a + n-^ 2 \\M\\ q )^/i 

+ a- 1 n- 1/2 \\M\\ 2 t 
where K{q) > is a constant depending only on q. 



, Va > 0, 



Remark 5.1. Theorem 15.11 gives a deviation inequality for suprema of em- 
pirical processes that only requires finite moments of envelope functions. 
Talagrand's inequality gives an exponential type deviation inequality 
for the supremum but requires uniform boundedness of F, which is violated 
in our applications. Another known deviation inequality similar in nature to 
Theorem 15. II is a Fuk-Nagaev type inequality proved in [l4| (see their The- 
orem 3.1). For the purporse of this paper, however, Theorem 15.11 is more 
suitable. 

Proof of Theorem \5.1\ The theorem essentially follows from , Theorem 
12, which states that 

||(||G n ||.F - E[||G n |H)+|| 9 < + a) + gn-^dlM ||, + a), 

where S 2 = E[||n _1 ^™ =1 (/(Xj) — P/) 2 ||j-]. By Lemma 7 of the same paper, 

£ 2 < a 2 + 64n- 1/2 ||M|| 2 E[||G„||j-] + 32n _1 ||M|||. 

Hence, using the simple inequality 2\fab < /3a + V/3 > 0, one has 

||(||G n ||^ - E[||G n H) + || g < v ^/?E[||G n ||. F ] + + p-^n-^WMh 

+ ^a + qn~ 1/2 {\\M\\ q + a). 
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Therefore, by Markov's inequality, for every t > 1, with probability > 1— t~ q , 
\\G n y < E[||G n |H + (||G n ||jr - E[||G n |H) + 
< (1 + C^/qf3t)E[\\G n \\ T ) 
+ C^d(l + f3- 1 )n- 1/2 \\M\\ 2 t 
+ Cy/dat + Cgn _1 / 2 (||M ||, + a)t, V/3 > 0. 
The final conclusion follows from taking j3 = C~ 1 q~ 1 ^ 2 t~ 1 a. □ 

Theorem 15.11 will be complemented with the following moment inequality 
for suprema of empirical processes, which is an extension of |41[, Theorem 
2.1, to possibly unbounded classes of functions. 

Theorem 5.2. Suppose that F G C 2 (P). Let 5 = a /\\F\\ Py2 . Then 

nw^nWA < j(5,f,f)\\f\\p, 2 + \\Mhp^liil . 

We give a full proof of Theorem 15 .21 for the sake of completeness. We first 
prove the following preliminary lemma. 

Lemma 5.1. Write J(5) for J(5,F,F). Then (i) the map 5 t— >■ J(S) is 
concave; (ii) J(c5) < cJ(5), Vc > 1; (Hi) the map 5 t-t J(5)/6 is non- 
increasing; (iv) the map R + x (0, oo) 3 (x,y) i— > J(y/ 'x/y)y/y is concave. 



Proof. Let A(e) = supg y/l + log N(T, eQ,e\\F\\Q j2 ). Part (i) follows from 
the fact that the map e i— > A(e) is non-increasing. Part (ii) follows from the 
inequality 

f C <5 

X(e)de 



o 



• / A(ce)de < c / A(e)cfe 
io Vo 



Part (iii) follows from the identity 

J(<5) f 1 



X(5e)de. 



o 



The proof of part (iv) uses some facts in convex analysis. Proofs of the 
following lemmas can be found in, e.g., |J|, Section 3.2. 

Lemma 5.2. Let D be a convex subset of M. n , and let f : D — >■ R be a 
concave function. Then the perspective (x,t) h-> tf(x/t), {(x,t) G R n+1 : 
x/t G D, t > 0} — t- R, is a/so concave. 

Lemma 5.3. Xei -Di 6e a convex subset ofW 1 , and let g^ : D\ — > R, 1 < 
i < k be concave functions. Let D 2 denote the convex hull of the set 
{(gi(x), . . . ,gk(x)) '■ x G D}. Let h : D 2 — > R be concave and nondecreasing 
in each coordinate. Then f{x) = h(gx(x), . . . , gk(x)), D\ —> R, is concave. 

Let h(s, t) = J(s/t)t, gi(x, y) = ^fx and g 2 (x, y) = ^fy. Then h is concave 
and nondecreasing in each coordinate, and g^i = 1,2 are concave. Hence 
J{\JxJy)yJy = h(gi(x,y),g(x,y)) is concave. □ 
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We will use a version of the contraction principle for Rademacher averages. 
Recall that a Rademacher random variable is a random variables taking ±1 
with equal probability. 

Lemma 5.4. Let £\,...,£ n be i.i.d. Rademacher random variables inde- 
pendent of X\ , . . . , X n . Then 



E 



1=1 



T- 



< 4E 



M 



i=i 



J 7 - 



Proof. See [27|], Theorem 4.12, and the discussion following the theorem. □ 

We will also use the following form of the Hoffmann-j0rgensen inequality. 

Theorem 5.3. Let £\,,..,E n be i.i.d. Rademacher random variables inde- 
pendent of Xi, . . . , X n . Then for every 1 < q < oo, 



E 



5>i/(*i) 



i=i 



1/'/ 



< E 



i=i 



T- 



+ \\M\\ q . 



Proof. See, e.g., [27], Theorem 6.20. 

We are now in position to prove Theorem 15.21 



□ 



Proof of Theorem \5.1A Without loss of generality, we may assume that F 
is everywhere positive. Let P n denote the empirical distribution that as- 
signs probability n~ l to each Xi. Let = supj e j- n _1 J21=i f 2 (Xi)- For 
i.i.d. Rademacher random variables E\, . . . ,s n independent of X\, . . . , X n , 
the symmetrization inequality gives 



E[||G„||jr] < 2E 



1 n 



Here the standard entropy integral inequality gives 



E 



i=i 



X\ , . . . , x„ 



<C\\F\\ Pn>2 



<WII f I|p,i,2 



< C / v / l + logN(F,e Pn ,e)de 



'l + logN(F,ep n ,e\\F\\p nt2 )de 
< C\\F\\p n>2 J(a n /\\F\\ Pn>2 ). 
Hence by Lemma l5.ll (iv) and Jensen's inequality, 



Z := E 



Ll>/(*i) 
In 

i=l 



< C\\F\\p, 2 J(^l]/\\F\\p, 2 ). 
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By the symmetrization inequality, the contraction principle (Lemma E 
and the Cauchy-Schwarz inequality, 

E[a 2 n ) < a 2 +E[\\E n [(f(X t ) - Pf 2 )]\\ T ] < a 2 + 2E [||E n [ £i / 2 pQ)]y 

9 r „ r , s n „ i 9 „ „ / l"n r , N , , , 2 1 \ 1/2 



< < 7 2 + 8E[M||E n [e i /(X i )]||_ F ] <^ + 8||M|| 2 (E [||E n [e 4 /(^)] |£ 
Here by the Hoffmann- J0rgensen inequality (Theorem 15 .3[) . 

'e 



|E n [ £i /(X0] 11^1) V2 ^Etl^^/^liy + n- 1 ||M|| 2 , 



so that, 



v/lRJ <C||F|| Pi2 (AV^DZ), 

where A 2 := max{<7 2 , n _1 ||M |||}/||F|||, )2 > b 2 and L> := ||M|| 2 /(Vn||F||^ 2 ) 
Therefore, using Lemma 15. II (ii). we have 



Z < C\\F\\p y2 J(A Vv^DZ) 

We consider the following two cases: 

(i) y/DZ < A. In this case, J (A V /DZ) < J (A), so that Z < 
C||i ? ||p 2 J (A). Since the map 5 \— > J{8)/5 is non-increasing (Lemma 15,1 
(iii)), 

J(A) J(S) f [|M|| 2 J(<5) 

J(A) = A — - — < A — — = max < J[d), 



A ~ 5 { W! ^II^IIp,2 

Since J{5)/5 > J(l) > 1, the last expression is bounded by 

|M|| 2 J 2 (<5) 



max < J(<5), 



(ii) VDZ > A. In this case, J(A V v 7 ^) < J{y/DZ), and since the map 
5 i—T- J{5)/5 is non-increasing (Lemma l5.ll (iii)). 

v 7 ^ " A - 6 

Therefore, 

Z < C\\F\\ Pt2 VDZ^, 
d 

that is 



Z < CIlFllioZ) 



J 2 (£) _ C\\M\\ 2 J 2 {5) 



This completes the proof. □ 

The bound in Theorem 15.21 will be explicit as soon as a suitable bound 
on the covering number is available. For example, the following corollary is 
an extension of 18], Proposition 2.1. 
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Corollary 5.1. Consider the same setup as in Theorem \5.Si Suppose that 
there exist constants a > e and v > 1 such that 



supN(F,e Q ,e\\F\\ Q , 2 ) < (a/e) v , < Ve < 1. 



Then 



Proof. We observe that 
r4 



allFl 



P,2 



fT 



+ 



d|M| 



log 



allFl 



P,2 



a 



f-6 /"OO 

J(<5) = / \Jl + v\og(a/e)de < a^Jv 

JO Ja/S 



VI + logs 



cfe. 



An integration by parts gives 



Vl + loge 



Vl + loge 



+ 



1 

2 J c e 2 Vl + loge 



by which we have 

Vl + loge , ^ 2Vl + logc ^ 2V2Vlog^ 



-de < 



< 



, if c > e. 



Since a/ 5 > a > e, we have 



J(<5) < 2y/2^5y/log(a/5). 
Applying Theorem 15. 2\ we obtain the desired conclusion. 



□ 



6. Proof of Theorem 12.11 

We make use of Lemma 14.11 to prove the theorem. Construct a tight 
Gaussian random element Gp in £°°(F) given in Lemma 12.11 indepen- 
dent of Xi, . . . ,X n . We note that one can extend Gp to the linear hull 
of F in such a way that Gp has linear sample paths [see H3 . Theorem 
3.1.1]. Let {/i,...,/jv} be a minimal e||F||p 5 2-net of (F, ep) with N = 
N(F, ep,e\\F\\p^)- Then for every / G J 7 , there is a function fj, 1 < j < N 
such that ep(f,fj) < e\\F\\p t 2- Define 

Z £ = max G n fn, Z* = sup Gp/, Z* £ = max Gp/,-, 

1<3<N f e jr l<j<N 

and F e = {/ — 5 : f,g G F,ep(f,g) < e\\F\\p t 2}. It is standard to see that 

< ||G n ||p £ , |Z* £ -Z*| < ||Gp||p £ . 



2(1 
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We shall apply Corollary O to Z £ . Recall that log(7V V n) = H n (e). 
Then for every Borel subset A of R and 8 > 0, 

P(Z £ G A) < P(Z* £ G vl 16<5 ) 

+ C [r 2 ^ + + B 4 )}F n (e)) J ff n (e) + n- 1 logn] , 

where 



1<J,K<N ' — ' 

1=1 



So = n _3/2 E 



£ 4 = n" 1/2 E 



max 

1<3<N 



i=l 



l/^XOI 3 • 1 ( max J/iprOI > «V^n( 



Clearly £i < n- 1 / 2 E[[|G n [|^],B 2 < n" 1 / 2 /? 3 , and 

£ 4 < n- 1 / 2 P[F 3 l(F > 5^H n {e)- 1 )]. 
Hence choosing 5 > in such a way that 



^- 2 n- 1 /2 E [|| Gn ||^]^ n ( e ) < 1, C5~*n- l l 2 ^H 2 n {e) < 1, 



that is, 



5 > Cmax{ 7 -V 2 n-V^(E[||G n ||^])V^i/2 (e);7 -i/3 n -i/6 KiJ 2/3 (£) j > 
one has 

p(z £ ei)< P(z* £ g A 16<5 ) + ^ + ^ K - 3 P[F 3 i(F > <V^ff w (er 1 )] + Clogn . 

2 4 n 
Note that 5 > cj- 1 / 3 ^ 1 / 6 ^?/ 3 ^), so that 

P[F 3 1(F > 5V^H n (e)- 1 )} < P[F 3 1(F/k > c 7 - 1/3 n 1 / 3 Pr„(e)- 1 / 3 )]. 
Therefore, 

P(Z £ ei)< P(Z* £ g A ies ) + | 



■n 



=: P(Z* £ G A 16<5 ) + | + error. 
By Theorem 15. 1\ with probability > 1 — 7/4, 



(17) 



||G n |k < g E[||G n ||^] + (e ||F|| P , 2 + ^HMll.h-Vs + rrV2||M|| 27 - 2 /s. 

Here, noting that J(S,F £ ,2F) < J(5,F,F) = J {5), by Theorem El we 
have 

n\\Gn\\r s ]< J{s)\\F\\ P , 2 + n-V 2 e- 2 J 2 {e)\\Mh. 
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Hence 

a :=(1 - 7 /4)-quantile of \\G n \\ T , < q J(e)||F|| Pi2 + n^V 2 J 2 {e)\\M\\ 2 

+ {e\\F\\p, 2 + n~ l l 2 \\M\\ q )^ l / q + n -^ 2 \\M\\ 2l - 2 ^ . (18) 

On the other hand, by the Borell-Sudakov-Tsirel'son inequality (4(3, Propo- 
sition A.l], with probability > 1 — 7/4, 



\\Gp\\ Te < n\Gp\\T s ] + e||F||p, 2 V21og(4/7). 
We can bound the expectation E[||Gp||jr.] by Dudley's inequality 

0, 

Corol- 

lary 2.2.8]: E[||G P ||^] < J(s)\\F\\p >2 . Hence 

b := (1 - 7 /4)-quantile of \\G P \\ Te < J{e)\\F\\ P ^ + e\\F\\ Pj2 ^log(l/-f). 

(19) 

Therefore, for every Borel subset A of R, 

P(Z e A) < F{Z £ e A a ) + 1 (by (USD) 

< F(Z* e e ,4 a + 16<5 ) + ^7 + error (by flUD) 

< P(Z* e A a+b+16<5 ) + 7 + error, (by PJ) 

The conclusion follows from Lemma |4.H □ 

Appendix A. Additional proofs 

A.l. Proof of Lemma l2.ll Let Gp be a centered Gaussian process indexed 
by T with covariance function K[Gp(f)Gp(g)] = P(f — Pf)(g — Pg). Recall 
that T is P-pre-Gaussian if and only if (T,pp) is totally bounded and Gp 
has a version that has sample paths almost surely uniformly pp-continuous, 
where pp(f,g) ■= \/Vavp(f — g) [40, Example 1.3.10]. Dudley's criterion 
for sample continuity of Gaussian processes states that when 



poo 

/ y/logNiJ 7 , p P ,e)de < 
Jo 



00, 



there is a version of Gp that has sample paths uniformly pp-continuous 
[iH p. 100-101]. The lemma readily follows from these observations and the 
simple fact pp <ep. □ 

A. 2. Proof of Lemma 12.21 Before proving Lemma |2.2[ we shall prepare 
the following lemmas. 

Lemma A.l. Let T and Q be classes of measurable functions S — > R, to 
which measurable envelopes F and G are attached, respectively. Denote by 
J- ■ Q the pointwise product of T and Q . Then 

smy> N{F-Q ,e Q ,2e\\FG\\Q, 2 ) < sup AT (J", e Q ,e\\F\\ Qy2 ) supiV(£, e Q ,e\\G\\ Q}2 ). 
Q Q Q 

Proof. Implicit in [40], Section 2.10.3. Hence we omit the detail. □ 
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Lemma A. 2. Let F be a class of measurable functions S — > R, to which a 
measurable envelope F is attached. For every q > 1, let F(q) = {\f\ q : / G 
F} . Then 

BupN(T(q),eQ,qe\\F*\\ QA ) < sup N(F, e Q , e\\F\\ Q . 2 ), < Ve < 1. 
Q Q 

Proof. Let us write F/F = {f/F : f £ F} with the convention 0/0 = 0. 
We first point out that 

sup N(F/F,e Q ,e) < supN(F,e Q ,e\\F\\ Qt2 ). (20) 
Q Q 

Indeed, for a given distribution Q, let {f%, . . . , /at} be an eH-F^Q^-net of F. 
Define the new distribution Q' by dQ' = (F 2 /\\F\\q 2 )dQ. Then for every 
/ G F, there is a function fj, 1 < j < N with ||/ — /j||q,2 < £ ||-f 1 [|Q,2 ) so 
that 

- /j/FHq^ < ||/ - /illQ^/HFHo.a < e. 

Returning to the original problem, for a given distribution Q, define the 
new distribution Q' by dQ' = (F 2 7||F<?||^ 2 )dQ. For every f,g£F, 

||/|*-|pH< g max{|/r 1 ,H«- 1 }|/-5| 

<?i ?, - 1 i/-^ = ?J r5 i//J i, -»/n 

Hence 

HI/I 9 - IsHIq.2 < gllF'IZ/F - 5/^111(3,2 < g||^|| Q , 2 ||//F - g/F\\ Q ,, 2 . 
This implies that 

N(F(q),e Q ,qe\\F«\\ Q , 2 ) < N(T/F,e Q ,,s). 
Combining (|20p leads to the desired conclusion. □ 



Proof of Lemma \2.2[ The second inequality is deduced from Theorem 15.21 
together with the covering number estimate ([3]). Hence we shall prove the 
first inequality. We first observe that 

E n [|/(Jf i )| 3 ]<P|/| 3 + n- 1 /2 Gn( | / |3 ); 

by which we have 

E [HlM/PrOlW < ^P P|/| 3 + n-^EdlCd/l 3 )^]. 

Let £i, . . . , e n be i.i.d. Rademacher random variables independent of X\, . . . , X n . 
By the symmetrization inequality, 



E[||G n (|/| 3 )|H<2E 



1 - 

75? 



n . 
i=i 
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By the contraction principle together with the Cauchy-Schwarz inequality, 



E 



5>i/(*i 



i=l 





< E 




T- 







1 3/2 



i=l 



< IIM 



,3/2 



E 



1 3/2 



i=l 



2 " 




T_ 


) 



1/2 



Moreover, by the HofTmann-j0rgensen inequality, 



E 



By Theorem 15.21 together with Lemma lA.21 we have 



n 


2 " 


\ 1/2 




n 






£*i/(*)i s/a 




I < E 








+ 


i=l 


T_ 


/ 




i=i 


.F- 





+ |M 



1 3/2 



E 



i=i 



1 3/2 



J"- 



< J(^F,F)||F 3 / 2 || P , 2 



by which we have 
E[||E n [|/(JQ)| 3 ] 



+ 



||M 3 / 2 || 2 J 2 (J 3/2 ,F,F) 
VnSi 



\ T ] -supPl/l^n^llMUl 



+ n- 1 / 2 ||M||f 



J(5 3 3/2 ,^F)||F||^ + 



3/2 



|M||f J 2 (^ 3/2 ,^,F) 



A further simplification is possible. By Lemma 15.11 (iii), the map 5 \— > 

J(6,J r ,F)/S is non-increasing, so that J 2 (Sf 2 , F, F) /<5 3 > J 2 (l, F, F) > 1. 

Hence the first term on the right side is not larger than || Af ||| J 2 {8V 2 , F, F) j (n<5 3 ). 
This completes the proof. □ 

A.3. Proofs of Propositions l3~TU3~3l 

Proof of Proposition \3. 1\ For given x G I , g G Q and h > 0, define 
fx,g,h(y> *) = c g (x)g(y)k(h- 1 (t - x)), (y, t) G U x M d . 

Consider the class of functions F n = {f x ,g,h n '■ (x,g) G i" x £/}. We shall 
apply Theorem 12.11 to .F n . Let Z n = supj g jr n G n /. We first note that 
\fx,g,h(y, t)\ < Cix(/fr||&||oo = F. It is not difficult to see that F n is pointwise 
measurable for every n > 1. Using Lemma lA.ll we can prove that there are 
constants a > and v > such that 



supAT(.F n ,eQ,£Cj xg 6||&||oo) < (o/e)", < Ve < 1, Vn > 1. 



(21) 



30 CHERNOZHUKOV, CHETVERIKOV, AND KATO 

Hence for every n > 1, T n is pre-Gaussian and there is a tight Gaussian 
random element G n in £°°(J-n) with mean zero and covariance function 

E[G n (f)G n (f)} = Cov(f(Y 1 ,X 1 ),f(Y 1 ,X 1 )), f,feT n . 

To apply Theorem 12.11 we make some complimentary calculations. By 
(|21|) . J(5,J- n ,F) = 0{5^/\og 1/(5) as 5 —7- uniformly in n. Moreover, 

Efl/^Jli,^)! 3 ] = \c g (x)\ 3 f E[| 5 (Yi)| 3 | Xx =t]|fc(^ 1 (t-x))| 3 p(t)di 
= |c 9 (x)| 3 ^ / E[| 5 (yi)| 3 I X 1 = x + h n t]\k(t)\ 3 p(x + h n t)dt 
<C/xg& 3 |Mloc/^ f \k(t)\ 3 dt, and 

= |c 9 (x)| 4 ^ / E[| 5 (yi)| 4 | Xi = x + h n t]\k(t)\ 4 p(x + M)* 

<C/ 4 xg& 4 ||p||oo^ / \k(t)\ A dt. 

Thus, by using Lemma 12.21 we have 

E[||E„[|/(y i ,X i )| 3 ]|kJ =0(^ + n- 1 logn), 
E[||G n ||^J = O(^/ 2 0o^ + n- 1 / 2 logn). 

Choosing k = K n = const. x(hf/ 3 + n~ 4 / 3 log 1 / 3 n), e = e n = n _1 / 6 K n and 
7 = 7„ = (logra) -1 , we have, after an elementary calculation, 

A n (e„,7 n ) = Oin-^hf/ 3 logn + n" 1 / 4 /^ 4 log 5/4 n + n^ 1 / 2 log 3 / 2 n). 

Moreover, as K n 7„ 1/,3 n 1//3 ff n (e n ) _1//3 — > oo, for large n, 

1(F > cKn^n^Hnien)- 1 ' 3 ) = 0. 

Therefore, by Theorem 12. H there is a sequence Z n of random variables such 
that Z n = supjgj-^ G n f and as n — > oo, 

|Z„ - Z n \ = P (n- 1 / 6 ^ 3 logn + n" 1 / 4 /^ 4 log 5 / 4 n + n" 1 / 2 log 3 / 2 n). 
This implies the conclusion of the theorem. Indeed, let 

B n (x,g) = K d ' 2 G n {f x , g >hn ), (x,g) e ixG, 

and W n = h n d ^ 2 Z n . Then B n is the desired Gaussian process, and as 
W n = hn d/2 Z n , we have W n = swp^ x ^ eIx g B n (x,g) and 

\w n -w n \ = h- d / 2 \z n -z n \ 

= (9 P {(n^)- 1 / 6 log n + (n/^)- 1 / 4 log 5/4 n + rT l f 2 h- d ' 2 log 3/2 n}. 
This completes the proof. □ 
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Proof of Proposition \3.2l We shall follow the notation used in the proof of 
Proposition 13.11 Take F(y,x) = Cixg\\k\\ooG(y) as an envelope of T n . A 
version of inequality (I21jl continues to hold with C/ x c;&||^||oo replaced by 
||F||q 5 2- Let D = sup^^d E[G 4 (Y"i) | X\ = x\. Then we have 

EO/^jyi,^)) 3 ] < (l + ^Cf^lblU^ / \k(t)\ 3 dt, and 

E[\f x>g>hn (Y 1 ,X 1 )\ i ] < DCj^llpUhi [ \k(t)\Ut. 

Thus, using Lemma 12.21 we have 

E [HEJ/^XOI 3 ]^] = 0(h d n + n- 1+3 /'logn), 
n\\&n\\r n -T n ] = 0(^/ 2 v^i + n- 1/2+2/ «togn). 

Choosing k = K n = const. x(hn 3 + n _1 / 3+1 / g log 1//3 n), e = e n = rT x ^K n 
and 7 = 7n = (logn) -1 , we have, after an elementary calculation, 

A n (£ n , 7n ) = 0(n- 1 /6 /l d/3 bgn + „-l/4^cf/4 log 5/4 n + n -l/2+l/, bg 3/2 n y 

We wish to check that 

E[(F/ Kn ) 3 l(F/ Kn > c7- 1/3 n 1 /3^ n ( £n )-i/3 )] = o(1) . 

Indeed, the left side is bounded by 

^ 3 ( C 7- 1 / 3 Kn n 1 /3^ n ( £n )-i/3 ) 3-</ ]E [ j p 9] = o^-^n-i) = 0(1). 

The rest of the proof is the same as in the previous one. □ 

Proof of Proposition Iff. 31 We only deal with case (ii) . The proof for case (i) 
is similar. For given x S I and K > 1, define 

fxMv, t) = va K (x) T ^ K (t), fa t)elx [0, l] d . 

Consider the class of functions T n = {f x ,K n ■ x £ I}. We shall ap- 
ply Theorem 12.11 to T n . Note that W n = supj g j- n G n f . First, we have 
\fx,K n (Vit)\ < bK n \v\ ='■ Fn(v,t)- Second, since 

\ V a Kn (x) T ^(t) - V a Kn (x') T ^ Kn (t)\ < LkMuHV ~ A V*,*' € [0, l] d , 

there is a constant a > such that 

mvN(F n ,e Q ,£\\F n \\ Qt2 ) < (aL K Je) d , < Ve < 1, Vn > 1. (22) 
Q 

Hence, for every n > 1, there is a tight Gaussian random element G n in 
^(J'n) with mean zero and covariance function 

E[G n (/)G„(/)] = Cov(/(r /1 ,X 1 ),/(r /1 ,X 1 )), /,/ G 7"„. 

To apply Theorem 12.11 we make some complimentary calculations. Note 
that, by ([22]), for every 5 n | with log(l/£ n ) = O(logn), J(5 n ,F n ,F n ) = 
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0(5 n y/log n). Let D = sup a . 6 r 0) ]_]ijE[|7/i| | X\ = x]. Then for each K > 1, 

EO^Or) V(^i)| 3 ] < mi\Vi\ 3 I X^aKixf^iX^ 3 } 

< (l + D^KniaKixf^iX^ 2 } 

< (1 + J D)6^a^(x) T E[^ X (X 1 )^(X 1 ) T ]a i ^(x) = (1 + D)b K , and 

Efl^a^V^i)! 4 ] < Db\. 
Thus, using Lemma 12.21 we have 

E [|pM/fa,*i)| 3 ]II.F»] = 0(b Kn +n- 1+3 /% 3 K Jogn), 
E[\\G n \\^J = 0(b KnV / l^ + n- 1 ^% 2 Kn logn). 

Choosing k = n n = const. x(6^ 3 + n~ 1 / 3+1 / q bK n log 1//3 n),e = e n = n -1 / 2 
and 7 = 7n = (logn) -1 , we have 

A n (e n) 7 n) = 0(n~ 1/6 6^ log n + n^b 1 ^ log 5 / 4 n + n^+^b^ log 3 / 2 n). 
We wish to check that 

E[{F n /K n fl(F n /K n > c 7 -1 /V/ 3 F n (e n )- 1 / 3 )] = o(l). 
Indeed, the left side is bounded by 

^(c^- 173 ^ 173 ^^)- 1 / 3 ) 3 -^^] = 0{n l -^K-%« Kn ) = o(l). 

Finally, let B n {x) = G n (f Xt K n ), x G /. Then B n is the desired Gaussian 
process, and by Theorem 12.11 there is a sequence W n of random variables 
such that W n = supj g jr n G n f = sup^gj- B n (x) and as n — > oo, | W n — W n \ = 
Of(A n (e n , 7n))- This completes the proof. □ 
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